memory load
Redundancy Maximization as a Principle of Associative Memory Learning
Blümel, Mark, Schneider, Andreas C., Neuhaus, Valentin, Ehrlich, David A., Graetz, Marcel, Wibral, Michael, Makkeh, Abdullah, Priesemann, Viola
Associative memory, traditionally modeled by Hopfield networks, enables the retrieval of previously stored patterns from partial or noisy cues. Yet, the local computational principles which are required to enable this function remain incompletely understood. To formally characterize the local information processing in such systems, we employ a recent extension of information theory - Partial Information Decomposition (PID). PID decomposes the contribution of different inputs to an output into unique information from each input, redundant information across inputs, and synergistic information that emerges from combining different inputs. Applying this framework to individual neurons in classical Hopfield networks we find that below the memory capacity, the information in a neuron's activity is characterized by high redundancy between the external pattern input and the internal recurrent input, while synergy and unique information are close to zero until the memory capacity is surpassed and performance drops steeply. Inspired by this observation, we use redundancy as an information-theoretic learning goal, which is directly optimized for each neuron, dramatically increasing the network's memory capacity to 1.59, a more than tenfold improvement over the 0.14 capacity of classical Hopfield networks and even outperforming recent state-of-the-art implementations of Hopfield networks. Ultimately, this work establishes redundancy maximization as a new design principle for associative memories and opens pathways for new associative memory models based on information-theoretic goals.
- North America > United States (0.14)
- Europe > Germany > Lower Saxony > Gottingen (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- (2 more...)
Cross-Linguistic Analysis of Memory Load in Sentence Comprehension: Linear Distance and Structural Density
This study examines whether sentence-level memory load in comprehension is better explained by linear proximity between syntactically related words or by the structural density of the intervening material. Building on locality-based accounts and cross-linguistic evidence for dependency length minimization, the work advances Intervener Complexity-the number of intervening heads between a head and its dependent-as a structurally grounded lens that refines linear distance measures. Using harmonized dependency treebanks and a mixed-effects framework across multiple languages, the analysis jointly evaluates sentence length, dependency length, and Intervener Complexity as predictors of the Memory-load measure. Studies in Psycholinguistics have reported the contributions of feature interference and misbinding to memory load during processing. For this study, I operationalized sentence-level memory load as the linear sum of feature misbinding and feature interference for tractability; current evidence does not establish that their cognitive contributions combine additively. All three factors are positively associated with memory load, with sentence length exerting the broadest influence and Intervener Complexity offering explanatory power beyond linear distance. Conceptually, the findings reconcile linear and hierarchical perspectives on locality by treating dependency length as an important surface signature while identifying intervening heads as a more proximate indicator of integration and maintenance demands. Methodologically, the study illustrates how UD-based graph measures and cross-linguistic mixed-effects modelling can disentangle linear and structural contributions to processing efficiency, providing a principled path for evaluating competing theories of memory load in sentence comprehension.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > India (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.48)
Efficient Memory Management for GPU-based Deep Learning Systems
Zhang, Junzhe, Yeung, Sai Ho, Shu, Yao, He, Bingsheng, Wang, Wei
GPU (graphics processing unit) has been used for many data-intensive applications. Among them, deep learning systems are one of the most important consumer systems for GPU nowadays. As deep learning applications impose deeper and larger models in order to achieve higher accuracy, memory management becomes an important research topic for deep learning systems, given that GPU has limited memory size. Many approaches have been proposed towards this issue, e.g., model compression and memory swapping. However, they either degrade the model accuracy or require a lot of manual intervention. In this paper, we propose two orthogonal approaches to reduce the memory cost from the system perspective. Our approaches are transparent to the models, and thus do not affect the model accuracy. They are achieved by exploiting the iterative nature of the training algorithm of deep learning to derive the lifetime and read/write order of all variables. With the lifetime semantics, we are able to implement a memory pool with minimal fragments. However, the optimization problem is NP-complete. We propose a heuristic algorithm that reduces up to 13.3% of memory compared with Nvidia's default memory pool with equal time complexity. With the read/write semantics, the variables that are not in use can be swapped out from GPU to CPU to reduce the memory footprint. We propose multiple swapping strategies to automatically decide which variable to swap and when to swap out (in), which reduces the memory cost by up to 34.2% without communication overhead.
- North America > United States > New York > New York County > New York City (0.04)
- South America > Peru > Loreto Department (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
Heuristics for Ordering Cue Search in Decision Making
Todd, Peter M., Dieckmann, Anja
Simple lexicographic decision heuristics that consider cues one at a time in a particular order and stop searching for cues as soon as a decision can be made have been shown to be both accurate and frugal in their use of information. But much of the simplicity and success of these heuristics comes from using an appropriate cue order. For instance, the Take The Best heuristic uses validity order for cues, which requires considerable computation, potentially undermining the computational advantages of the simple decision mechanism. But many cue orders can achieve good decision performance, and studies of sequential search for data records have proposed a number of simple ordering rules that may be of use in constructing appropriate decision cue orders as well. Here we consider a range of simple cue ordering mechanisms, including tallying, swapping, and move-to-front rules, and show that they can find cue orders that lead to reasonable accuracy and considerable frugality when used with lexicographic decision heuristics.
- North America > United States > New York (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
Heuristics for Ordering Cue Search in Decision Making
Todd, Peter M., Dieckmann, Anja
Simple lexicographic decision heuristics that consider cues one at a time in a particular order and stop searching for cues as soon as a decision can be made have been shown to be both accurate and frugal in their use of information. But much of the simplicity and success of these heuristics comes from using an appropriate cue order. For instance, the Take The Best heuristic uses validity order for cues, which requires considerable computation, potentially undermining the computational advantages of the simple decision mechanism. But many cue orders can achieve good decision performance, and studies of sequential search for data records have proposed a number of simple ordering rules that may be of use in constructing appropriate decision cue orders as well. Here we consider a range of simple cue ordering mechanisms, including tallying, swapping, and move-to-front rules, and show that they can find cue orders that lead to reasonable accuracy and considerable frugality when used with lexicographic decision heuristics.
- North America > United States > New York (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
Bidirectional Retrieval from Associative Memory
Sommer, Friedrich T., Palm, Günther
Similarity based fault tolerant retrieval in neural associative memories (NAM) has not lead to wiedespread applications. A drawback of the efficient Willshaw model for sparse patterns [Ste61, WBLH69], is that the high asymptotic information capacity is of little practical use because of high cross talk noise arising in the retrieval for finite sizes. Here a new bidirectional iterative retrieval method for the Willshaw model is presented, called crosswise bidirectional (CB)retrieval, providing enhanced performance. We discuss its asymptotic capacity limit, analyze the first step, and compare itin experiments with the Willshaw model. Applying the very efficient CB memory model either in information retrieval systems or as a functional model for reciprocal cortico-cortical pathways requires more than robustness against random noise in the input: Our experiments show also the segmentation ability of CB-retrieval with addresses containing the superposition of pattens, provided even at high memory load. 1 INTRODUCTION From a technical point of view neural associative memories (NAM) provide data storage and retrieval.
- North America > United States (0.14)
- Europe > Germany (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.84)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.84)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Bidirectional Retrieval from Associative Memory
Sommer, Friedrich T., Palm, Günther
Similarity based fault tolerant retrieval in neural associative memories (N AM) has not lead to wiedespread applications. A drawback of the efficient Willshaw model for sparse patterns [Ste61, WBLH69], is that the high asymptotic information capacity is of little practical use because of high cross talk noise arising in the retrieval for finite sizes. Here a new bidirectional iterative retrieval method for the Willshaw model is presented, called crosswise bidirectional (CB) retrieval, providing enhanced performance. We discuss its asymptotic capacity limit, analyze the first step, and compare it in experiments with the Willshaw model. Applying the very efficient CB memory model either in information retrieval systems or as a functional model for reciprocal cortico-cortical pathways requires more than robustness against random noise in the input: Our experiments show also the segmentation ability of CB-retrieval with addresses containing the superposition of pattens, provided even at high memory load.
- North America > United States (0.14)
- Europe > Germany (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.65)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.65)
Bidirectional Retrieval from Associative Memory
Sommer, Friedrich T., Palm, Günther
Similarity based fault tolerant retrieval in neural associative memories (N AM) has not lead to wiedespread applications. A drawback of the efficient Willshaw model for sparse patterns [Ste61, WBLH69], is that the high asymptotic information capacity is of little practical use because of high cross talk noise arising in the retrieval for finite sizes. Here a new bidirectional iterative retrieval method for the Willshaw model is presented, called crosswise bidirectional (CB) retrieval, providing enhanced performance. We discuss its asymptotic capacity limit, analyze the first step, and compare it in experiments with the Willshaw model. Applying the very efficient CB memory model either in information retrieval systems or as a functional model for reciprocal cortico-cortical pathways requires more than robustness against random noise in the input: Our experiments show also the segmentation ability of CB-retrieval with addresses containing the superposition of pattens, provided even at high memory load.
- North America > United States (0.14)
- Europe > Germany (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.65)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.65)
Optimal Signalling in Attractor Neural Networks
Meilijson, Isaac, Ruppin, Eytan
It is well known that a given cortical neuron can respond with a different firing pattern forthe same synaptic input, depending on its firing history and on the effects of modulator transmitters (see [Connors and Gutnick, 1990] for a review). The time span of different channel conductances is very broad, and the influence of some ionic currents varies with the history of the membrane potential [Lytton, 1991]. Motivated bythe history-dependent nature of neuronal firing, we continue .our
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
- North America > United States > Maryland (0.04)
Optimal Signalling in Attractor Neural Networks
Meilijson, Isaac, Ruppin, Eytan
It is well known that a given cortical neuron can respond with a different firing pattern for the same synaptic input, depending on its firing history and on the effects of modulator transmitters (see [Connors and Gutnick, 1990] for a review). The time span of different channel conductances is very broad, and the influence of some ionic currents varies with the history of the membrane potential [Lytton, 1991]. Motivated by the history-dependent nature of neuronal firing, we continue.our
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
- North America > United States > Maryland (0.04)